Skip to content

[InstCombine] Fold (trunc X) into X & Mask inside decomposeBitTestICmp#171195

Merged
dtcxzyw merged 4 commits intollvm:mainfrom
wermos:trunc
Jan 16, 2026
Merged

[InstCombine] Fold (trunc X) into X & Mask inside decomposeBitTestICmp#171195
dtcxzyw merged 4 commits intollvm:mainfrom
wermos:trunc

Conversation

@wermos
Copy link
Copy Markdown
Contributor

@wermos wermos commented Dec 8, 2025

Resolves #170020.

Added another case to the ICmp::EQ/ICmp::NE case in the switch inside decomposeBitTestICmp to convert trunc X into a X & Mask.

@wermos wermos requested a review from nikic as a code owner December 8, 2025 20:08
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 8, 2025
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Dec 8, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Tirthankar Mazumder (wermos)

Changes

Addresses #170020.

I'm not exactly sure what kind of Alive2 proof is required when the optimization has to do with KnownBits stuff, so I'm copying over the Alive2 proof for the specific case discussed in the issue: https://alive2.llvm.org/ce/z/K59kAt

I followed the suggestion given here:
> I'd suggest reusing computeKnownBitsFromICmpCond to compute known bits inferred from both conditions. If the union of known bits is a constant, convert the and/or into an equality test. It would be a bit tricky to select a suitable X.

To do this, I had to make computeKnownBitsFromICmpCond a part of the ValueTracking.h header.

I'm also not sure if more tests are required or not.


Full diff: https://github.com/llvm/llvm-project/pull/171195.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ValueTracking.h (+9)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+3-3)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+40)
  • (modified) llvm/test/Transforms/InstCombine/and-or-icmps.ll (+34-9)
diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index b730a36488780..48cc85e719421 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -102,6 +102,15 @@ LLVM_ABI void computeKnownBitsFromContext(const Value *V, KnownBits &Known,
                                           const SimplifyQuery &Q,
                                           unsigned Depth = 0);
 
+/// Update \p Known with bits of \p V that are implied by \p Cmp.
+/// Comparisons involving `trunc V` are handled specially: known
+/// bits are computed for the truncated value and then extended to the bitwidth
+/// of \p V.
+LLVM_ABI void computeKnownBitsFromICmpCond(const Value *V, ICmpInst *Cmp,
+                                           KnownBits &Known,
+                                           const SimplifyQuery &SQ,
+                                           bool Invert);
+
 /// Using KnownBits LHS/RHS produce the known bits for logic op (and/xor/or).
 LLVM_ABI KnownBits analyzeKnownBitsFromAndXorOr(const Operator *I,
                                                 const KnownBits &KnownLHS,
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 9cb6f19b9340c..5ab5f8cfccc7f 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -968,9 +968,9 @@ static void computeKnownBitsFromCmp(const Value *V, CmpInst::Predicate Pred,
   }
 }
 
-static void computeKnownBitsFromICmpCond(const Value *V, ICmpInst *Cmp,
-                                         KnownBits &Known,
-                                         const SimplifyQuery &SQ, bool Invert) {
+void llvm::computeKnownBitsFromICmpCond(const Value *V, ICmpInst *Cmp,
+                                        KnownBits &Known,
+                                        const SimplifyQuery &SQ, bool Invert) {
   ICmpInst::Predicate Pred =
       Invert ? Cmp->getInversePredicate() : Cmp->getPredicate();
   Value *LHS = Cmp->getOperand(0);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index ba5568b00441b..fa7c66d736c28 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -15,11 +15,13 @@
 #include "llvm/Analysis/CmpInstAnalysis.h"
 #include "llvm/Analysis/FloatingPointPredicateUtils.h"
 #include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/PatternMatch.h"
+#include "llvm/Support/KnownBits.h"
 #include "llvm/Transforms/InstCombine/InstCombiner.h"
 #include "llvm/Transforms/Utils/Local.h"
 
@@ -3376,9 +3378,13 @@ Value *InstCombinerImpl::foldAndOrOfICmps(ICmpInst *LHS, ICmpInst *RHS,
   Value *LHS0 = LHS->getOperand(0), *RHS0 = RHS->getOperand(0);
   Value *LHS1 = LHS->getOperand(1), *RHS1 = RHS->getOperand(1);
 
+  // dbgs() << "LHS0 = " << *LHS0 << "\nLHS1 = " << *LHS1 << '\n';
+  // dbgs() << "RHS0 = " << *RHS0 << "\nRHS1 = " << *RHS1 << '\n';
+
   const APInt *LHSC = nullptr, *RHSC = nullptr;
   match(LHS1, m_APInt(LHSC));
   match(RHS1, m_APInt(RHSC));
+  // dbgs() << "LHSC = " << *LHSC << "\nRHSC = " << *RHSC << '\n';
 
   // (icmp1 A, B) | (icmp2 A, B) --> (icmp3 A, B)
   // (icmp1 A, B) & (icmp2 A, B) --> (icmp3 A, B)
@@ -3575,6 +3581,40 @@ Value *InstCombinerImpl::foldAndOrOfICmps(ICmpInst *LHS, ICmpInst *RHS,
     return Builder.createIsFPClass(X, IsAnd ? FPClassTest::fcNormal
                                             : ~FPClassTest::fcNormal);
 
+  if (!IsLogical && IsAnd) {
+    auto TryCandidate = [&](Value *X) -> Value * {
+      if (!X->getType()->isIntegerTy())
+        return nullptr;
+
+      Type *Ty = X->getType();
+      unsigned BitWidth = Ty->getScalarSizeInBits();
+
+      // KnownL and KnownR hold information deduced from the LHS icmp and RHS
+      // icmps, respectively
+      KnownBits KnownL(BitWidth), KnownR(BitWidth);
+
+      computeKnownBitsFromICmpCond(X, LHS, KnownL, Q, /*Invert=*/false);
+      computeKnownBitsFromICmpCond(X, RHS, KnownR, Q, /*Invert=*/false);
+
+      KnownBits Combined = KnownL.unionWith(KnownR);
+
+      // Avoid stomping on cases where one icmp alone determines X. Those are handled by more specific InstCombine folds.
+      if (KnownL.isConstant() || KnownR.isConstant())
+        return nullptr;
+
+      if (!Combined.isConstant())
+        return nullptr;
+
+      APInt ConstVal = Combined.getConstant();
+      return Builder.CreateICmpEQ(X, ConstantInt::get(Ty, ConstVal));
+    };
+
+    if (Value *Res = TryCandidate(LHS0))
+      return Res;
+    if (Value *Res = TryCandidate(RHS0))
+      return Res;
+  }
+
   return foldAndOrOfICmpsUsingRanges(LHS, RHS, IsAnd);
 }
 
diff --git a/llvm/test/Transforms/InstCombine/and-or-icmps.ll b/llvm/test/Transforms/InstCombine/and-or-icmps.ll
index 290e344acb980..9d69fadfa9627 100644
--- a/llvm/test/Transforms/InstCombine/and-or-icmps.ll
+++ b/llvm/test/Transforms/InstCombine/and-or-icmps.ll
@@ -702,9 +702,9 @@ define i1 @PR42691_10_logical(i32 %x) {
 
 define i1 @substitute_constant_and_eq_eq(i8 %x, i8 %y) {
 ; CHECK-LABEL: @substitute_constant_and_eq_eq(
-; CHECK-NEXT:    [[C1:%.*]] = icmp eq i8 [[X:%.*]], 42
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i8 [[Y:%.*]], 42
-; CHECK-NEXT:    [[R:%.*]] = and i1 [[C1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i8 [[Y1:%.*]], 42
+; CHECK-NEXT:    [[R:%.*]] = and i1 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %c1 = icmp eq i8 %x, 42
@@ -728,9 +728,9 @@ define i1 @substitute_constant_and_eq_eq_logical(i8 %x, i8 %y) {
 
 define i1 @substitute_constant_and_eq_eq_commute(i8 %x, i8 %y) {
 ; CHECK-LABEL: @substitute_constant_and_eq_eq_commute(
-; CHECK-NEXT:    [[C1:%.*]] = icmp eq i8 [[X:%.*]], 42
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i8 [[Y:%.*]], 42
-; CHECK-NEXT:    [[R:%.*]] = and i1 [[C1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i8 [[Y1:%.*]], 42
+; CHECK-NEXT:    [[R:%.*]] = and i1 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %c1 = icmp eq i8 %x, 42
@@ -741,9 +741,9 @@ define i1 @substitute_constant_and_eq_eq_commute(i8 %x, i8 %y) {
 
 define i1 @substitute_constant_and_eq_eq_commute_logical(i8 %x, i8 %y) {
 ; CHECK-LABEL: @substitute_constant_and_eq_eq_commute_logical(
-; CHECK-NEXT:    [[C1:%.*]] = icmp eq i8 [[X:%.*]], 42
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i8 [[Y:%.*]], 42
-; CHECK-NEXT:    [[R:%.*]] = and i1 [[C1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i8 [[Y1:%.*]], 42
+; CHECK-NEXT:    [[R:%.*]] = and i1 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %c1 = icmp eq i8 %x, 42
@@ -1392,12 +1392,12 @@ define i1 @bitwise_and_bitwise_and_icmps(i8 %x, i8 %y, i8 %z) {
 
 define i1 @bitwise_and_bitwise_and_icmps_comm1(i8 %x, i8 %y, i8 %z) {
 ; CHECK-LABEL: @bitwise_and_bitwise_and_icmps_comm1(
-; CHECK-NEXT:    [[C1:%.*]] = icmp eq i8 [[Y:%.*]], 42
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i8 [[Y:%.*]], 42
 ; CHECK-NEXT:    [[Z_SHIFT:%.*]] = shl nuw i8 1, [[Z:%.*]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = or i8 [[Z_SHIFT]], 1
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i8 [[X:%.*]], [[TMP1]]
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i8 [[TMP2]], [[TMP1]]
-; CHECK-NEXT:    [[AND2:%.*]] = and i1 [[C1]], [[TMP3]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i8 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    [[AND2:%.*]] = and i1 [[TMP3]], [[TMP4]]
 ; CHECK-NEXT:    ret i1 [[AND2]]
 ;
   %c1 = icmp eq i8 %y, 42
@@ -3721,3 +3721,28 @@ define i1 @merge_range_check_or(i8 %a) {
   %and = or i1 %cmp1, %cmp2
   ret i1 %and
 }
+
+; Just a very complicated way of checking if v1 == 0.
+define i1 @complicated_zero_equality_test(i64 %v1) {
+; CHECK-LABEL: @complicated_zero_equality_test(
+; CHECK-NEXT:    [[V5:%.*]] = icmp eq i64 [[V1:%.*]], 0
+; CHECK-NEXT:    ret i1 [[V5]]
+;
+  %v2 = trunc i64 %v1 to i32
+  %v3 = icmp eq i32 %v2, 0
+  %v4 = icmp ult i64 %v1, 4294967296 ; 2 ^ 32
+  %v5 = and i1 %v4, %v3
+  ret i1 %v5
+}
+
+define i1 @commuted_complicated_zero_equality_test(i64 %v1) {
+; CHECK-LABEL: @commuted_complicated_zero_equality_test(
+; CHECK-NEXT:    [[V5:%.*]] = icmp eq i64 [[V1:%.*]], 0
+; CHECK-NEXT:    ret i1 [[V5]]
+;
+  %v2 = trunc i64 %v1 to i32
+  %v3 = icmp ult i64 %v1, 4294967296 ; 2 ^ 32
+  %v4 = icmp eq i32 %v2, 0
+  %v5 = and i1 %v4, %v3
+  ret i1 %v5
+}

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 8, 2025

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 8, 2025

I've addressed the email thing as well.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 8, 2025

Ping @dtcxzyw for review.

Comment thread llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated
@andjo403
Copy link
Copy Markdown
Contributor

andjo403 commented Dec 8, 2025

if the trunc is repalced by an and this is already folded see https://alive2.llvm.org/ce/z/Whfa65
I assume it is handled by

/// Handle (icmp(A & B) ==/!= C) &/| (icmp(A & D) ==/!= E).
/// Return the pattern classes (from MaskedICmpType) for the left hand side and
/// the right hand side as a pair.
/// LHS and RHS are the left hand side and the right hand side ICmps and PredL
/// and PredR are their predicates, respectively.
static std::optional<std::pair<unsigned, unsigned>>
getMaskedTypeForICmpPair(Value *&A, Value *&B, Value *&C, Value *&D, Value *&E,
Value *LHS, Value *RHS, ICmpInst::Predicate &PredL,
ICmpInst::Predicate &PredR) {
maybe can add support for trunc to a mask there? Do not know that the best solutions is I only thought of that function when I looked at the described fold.

@zwuis
Copy link
Copy Markdown
Contributor

zwuis commented Dec 9, 2025

Addresses #170020.

You could use this format.

@dtcxzyw
Copy link
Copy Markdown
Member

dtcxzyw commented Dec 9, 2025

maybe can add support for trunc to a mask there?

Oh yes, we can simply handle this in decomposeBitTestICmp. The following patch works, but I think it should be moved into decomposeBitTestICmp.

diff --git a/llvm/lib/Analysis/CmpInstAnalysis.cpp b/llvm/lib/Analysis/CmpInstAnalysis.cpp
index a1a79e5685f8..362e7d0508a5 100644
--- a/llvm/lib/Analysis/CmpInstAnalysis.cpp
+++ b/llvm/lib/Analysis/CmpInstAnalysis.cpp
@@ -193,9 +193,25 @@ std::optional<DecomposedBitTest> llvm::decomposeBitTest(Value *Cond,
     // Don't allow pointers. Splat vectors are fine.
     if (!ICmp->getOperand(0)->getType()->isIntOrIntVectorTy())
       return std::nullopt;
-    return decomposeBitTestICmp(ICmp->getOperand(0), ICmp->getOperand(1),
+    if (auto Res = decomposeBitTestICmp(ICmp->getOperand(0), ICmp->getOperand(1),
                                 ICmp->getPredicate(), LookThruTrunc,
-                                AllowNonZeroC, DecomposeAnd);
+                                AllowNonZeroC, DecomposeAnd)) {
+      return Res;
+    }
+
+    CmpPredicate Pred;
+    Value *X;
+    const APInt *RHSC;
+    if (LookThruTrunc && match(Cond, m_ICmp(Pred, m_Trunc(m_Value(X)), 
+                               m_APInt(RHSC))) && (AllowNonZeroC || RHSC->isZero()) && ICmpInst::isEquality(Pred)) {
+      DecomposedBitTest Result;
+      Result.X = X;
+      unsigned BitWidth = X->getType()->getScalarSizeInBits();
+      Result.Mask = APInt::getLowBitsSet(BitWidth, RHSC->getBitWidth());
+      Result.C = RHSC->zext(BitWidth);
+      Result.Pred = Pred;
+      return Result;
+    }
   }
   Value *X;
   if (Cond->getType()->isIntOrIntVectorTy(1) &&

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 10, 2025

Alright, I'll work on moving the patch you shared into decomposeBitTestICmp.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 11, 2025

I've redone the entire implementation. I removed my previous changes and modified decomposeBitTestICmp to convert the trunc X into a X & Mask.

@wermos wermos changed the title [InstCombine] Fold (x < 2^32) & (trunc(x to i32) == 0) into x == 0 [InstCombine] Fold (trunc X) into X & Mask inside decomposeBitTestICmp Dec 11, 2025
Comment thread llvm/test/Transforms/InstCombine/and-or-icmps.ll
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 11, 2025

🐧 Linux x64 Test Results

  • 188320 tests passed
  • 4998 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 11, 2025

🪟 Windows x64 Test Results

  • 129314 tests passed
  • 2857 tests skipped

✅ The build succeeded and all tests passed.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 11, 2025

It looks like my changes affected a lot of tests. What is the protocol for determining whether these changes are regressions or not?

@nikic
Copy link
Copy Markdown
Contributor

nikic commented Dec 11, 2025

It looks like my changes affected a lot of tests. What is the protocol for determining whether these changes are regressions or not?

Regenerate the failing tests (you can use something like build/bin/llvm-lit --update-tests llvm/test/Transforms/InstCombine) and look at the diffs. (Feel free to push the changes.)

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 11, 2025

I manually checked the test diffs by feeding the changes into Alive2 and seeing if the transformation is valid or not:

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Dec 11, 2025

The test diffs I just pushed are all of the form trunc i32 %x to i8 and then icmp eq i8 %v1, %c (where %c is small enough to fit inside an i8 without overflowing).

I'm attaching an Alive2 link for this transform for the i32 -> i8 case, but I didn't bother writing a test for the i128 -> i64, i64 -> i32, and the i32 -> i16 cases. I eyeballed them and the masks look correct to me.

; CHECK-NEXT: [[ORIENTATIONS:%.*]] = alloca [1 x [1 x %struct.x]], align 8
; CHECK-NEXT: [[ORIENTATIONS:%.*]] = alloca [1 x [1 x [[STRUCT_X:%.*]]]], align 8
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this line changed. Is this a harmless change? I'm not 100% sure, but it looks like a cosmetic (variable renaming) change to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is the only change now this file can be reverted

Comment thread llvm/lib/Analysis/CmpInstAnalysis.cpp Outdated
; CHECK-NEXT: [[ORIENTATIONS:%.*]] = alloca [1 x [1 x %struct.x]], align 8
; CHECK-NEXT: [[ORIENTATIONS:%.*]] = alloca [1 x [1 x [[STRUCT_X:%.*]]]], align 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is the only change now this file can be reverted

Comment thread llvm/test/Transforms/InstCombine/and-or-icmps.ll Outdated
@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 6, 2026

I've reverted the change in llvm/test/Transforms/InstCombine/getelementptr.ll as well. I can't reply to that review comment for some reason.

I'm going to close and reopen the PR to restart the CI because the AArch64 build failed due to a server timeout.

@wermos wermos marked this pull request as draft January 6, 2026 06:55
@wermos wermos marked this pull request as ready for review January 6, 2026 06:55
@wermos wermos closed this Jan 6, 2026
@wermos wermos reopened this Jan 6, 2026
@wermos wermos requested a review from dtcxzyw January 6, 2026 07:48
Copy link
Copy Markdown
Contributor

@andjo403 andjo403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me wait for second reviewer.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 10, 2026

Ping @dtcxzyw @nikic for a second review

Copy link
Copy Markdown
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, as I commented in #171195 (comment), the current implementation converts (trunc X) ==/!= C to (and X, mask) ==/!= 0 unconditionally and causes some irrevertible regressions. As a follow-up, we can try to preserve previous behavior by checking if the mask is getLowBitsSet(bit width of original operands). Then we can check the net effect.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 15, 2026

Ping @nikic for final review

@dtcxzyw
Copy link
Copy Markdown
Member

dtcxzyw commented Jan 16, 2026

Ping @nikic for final review

Do you need me to merge this?

Copy link
Copy Markdown
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit 65533d3 into llvm:main Jan 16, 2026
10 checks passed
@llvm-ci
Copy link
Copy Markdown

llvm-ci commented Jan 16, 2026

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/27400

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
88.984 [4/12/109] Linking CXX executable bin/clang-nvlink-wrapper
90.004 [4/11/110] Linking CXX executable bin/clang-diff
90.975 [4/10/111] Linking CXX executable bin/clang-import-test
91.767 [4/9/112] Linking CXX executable bin/clang-extdef-mapping
92.509 [4/8/113] Linking CXX executable bin/clang-installapi
95.285 [4/7/114] Linking CXX executable bin/clang-check
95.875 [4/6/115] Linking CXX executable bin/clang-refactor
97.790 [4/5/116] Linking CXX executable bin/clang-repl
98.016 [4/4/117] Linking CXX executable bin/clang-23
98.031 [3/4/118] Creating executable symlink bin/clang
command timed out: 1200 seconds without output running [b'ninja', b'-j', b'16'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1298.911171

Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…stICmp` (llvm#171195)

Resolves llvm#170020.

Added another case to the `ICmp::EQ`/`ICmp::NE` case in the switch
inside `decomposeBitTestICmp` to convert `trunc X` into a `X & Mask`.
@alexfh
Copy link
Copy Markdown
Contributor

alexfh commented Feb 4, 2026

Hi @dtcxzyw,

we see a surprising interaction of this optimization and memory sanitizer, which starts complaining about a use of uninitialized value in the code looking roughly like this (https://gcc.godbolt.org/z/M1MKcznhx):

bool f(std::optional<int> x) {
  return x == 0 || x == 1;
}

IIUC, after SROA std::optional<int> is represented by an i64 with 33 bits, where bit 33 is the has_value bit and the lower 32 bits are value. After this commit LLVM translates the code above into two comparisons of lower 33 bits of this integer to constants:

  %0 = and i64 %x.coerce, 8589934591 // 0x1ffffffff
  %retval.0.i = icmp eq i64 %0, 4294967296 // 0x100000000
  %1 = and i64 %x.coerce, 8589934591
  %retval.0.i7 = icmp eq i64 %1, 4294967297 // 0x100000001
  %2 = or i1 %retval.0.i, %retval.0.i7
  ret i1 %2

And the problem seems to be that the comparisons take into account potentially uninitialized lower 32 bits. Is this a valid thing to do in LLVM IR?

@dtcxzyw
Copy link
Copy Markdown
Member

dtcxzyw commented Feb 4, 2026

IIUC, after SROA std::optional is represented by an i64 with 33 bits, where bit 33 is the has_value bit and the lower 32 bits are value

To my knowledge, we don't model partial undef in LLVM. Can you find which transform breaks the check? If it involves two uses of the same value, we can simply add an isGuaranteedNotToBeUndef guard.

@alexfh
Copy link
Copy Markdown
Contributor

alexfh commented Feb 4, 2026

IIUC, after SROA std::optional is represented by an i64 with 33 bits, where bit 33 is the has_value bit and the lower 32 bits are value

To my knowledge, we don't model partial undef in LLVM. Can you find which transform breaks the check? If it involves two uses of the same value, we can simply add an isGuaranteedNotToBeUndef guard.

@thurstond should know better what happens from the point of view of msan.

@wermos wermos deleted the trunc branch February 17, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missed Optimization: Fold (x < 2^32) & (trunc(x to i32) == 0) into x == 0

8 participants